Skip to content

Conversation

@lemorage
Copy link

@lemorage lemorage commented Jan 15, 2026

Which issue does this PR close?

What changes are included in this PR?

Change internal error to user-facing error when function type coercion fails. Add helpful hint when Binary types are used with string functions.

Before:

Internal error: Expect TypeSignatureClass::Native(...) but received NativeType::Binary, DataType: Binary

After:

Error: Function 'split_part' requires String, but received Binary (DataType: Binary).
Hint: Binary types are not automatically coerced to String. Use CAST(column AS VARCHAR) to convert Binary data to String.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 15, 2026
@lemorage
Copy link
Author

The fix is prepared, I am not sure if this would be a reasonable change tbh tho. Maybe just setting config is fine? set datafusion.execution.parquet.binary_as_string = true. Need to be discussed before merging.

@Jefffrey
Copy link
Contributor

Personally I'm not sure about this approach as it might suggest all other string functions with similar signature should now be fixed to also ensure they coerce from binary 🤔

e.g.

impl AsciiFunc {
pub fn new() -> Self {
Self {
signature: Signature::coercible(
vec![Coercion::new_exact(TypeSignatureClass::Native(
logical_string(),
))],
Volatility::Immutable,
),
}
}
}

impl BTrimFunc {
pub fn new() -> Self {
Self {
signature: Signature::one_of(
vec![
TypeSignature::Coercible(vec![
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
]),
TypeSignature::Coercible(vec![Coercion::new_exact(
TypeSignatureClass::Native(logical_string()),
)]),
],
Volatility::Immutable,
),
aliases: vec![String::from("trim")],
}
}
}

impl ContainsFunc {
pub fn new() -> Self {
Self {
signature: Signature::coercible(
vec![
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
],
Volatility::Immutable,
),
}
}
}

@adriangb
Copy link
Contributor

I guess ClickHouse does coerce them? So is this just a case of our semantics != ClickHouse?

@Jefffrey
Copy link
Contributor

I guess ClickHouse does coerce them? So is this just a case of our semantics != ClickHouse?

It's a similar problem to how Spark will coerce strings to ints for their math functions but we don't.

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

I guess ClickHouse does coerce them? So is this just a case of our semantics != ClickHouse?

The problem is only with the ClickHouse hits_partitioned dataset (which has 100 parquet files) which was written by an ancient version of pyarrow which did not correctly annotate the string columns as STRING. This is the only usecase for this config:

https://datafusion.apache.org/user-guide/configs.html

datafusion.execution.parquet.binary_as_string false (reading) If true, parquet reader will read columns of Binary/LargeBinary with Utf8, and BinaryView with Utf8View. Parquet files generated by some legacy writers do not correctly set the UTF8 flag for strings, causing string columns to be loaded as BLOB instead.

So TLDR I don't think we need to make split_part work with binary data, as proposed in this PR.

Instead I propose:

  1. Change the internal error to a proper error when running split_part on binary data
  2. Update the benchmarks to properly set the binary as string. I made a PR here: Use correct setting for click bench queries in sql_planner benchmark #19835

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

Thank you for working on this @lemorage

Did you find out what was causing the internal error? I would expect the query would error with a normal error (about not supporting binary columns)

@Jefffrey
Copy link
Contributor

Did you find out what was causing the internal error? I would expect the query would error with a normal error (about not supporting binary columns)

This is related to

@github-actions github-actions bot added logical-expr Logical plan and expressions and removed sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 16, 2026
@lemorage lemorage changed the title fix: allow Binary type coercion in split_part function Improve error message when string functions receive Binary types Jan 16, 2026
@lemorage
Copy link
Author

Thanks folks! Have updated the PR to improve the error msg better instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants